Closed Bug 1356317 Opened 7 years ago Closed 7 years ago

Support remote layer trees in popups on Windows

Categories

(Core :: Graphics: Layers, enhancement, P1)

Unspecified
Windows
enhancement

Tracking

()

VERIFIED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox55 --- fixed
firefox56 --- verified

People

(Reporter: kmag, Assigned: spohl)

References

(Blocks 2 open bugs)

Details

(Whiteboard: tpi:+)

Attachments

(6 files, 4 obsolete files)

Since bug 630346, transparent popups on Linux use a BasicLayerManager to make it easier to compute shape masks. Unfortunately, as a side-effect of this, it's not possible for them to host remote frameloaders.

We need to be able to show remote browsers in popups for OOP WebExtensions, so I think we need to find a better solution for handling transparency here.
Blocks: 1357487
Priority: -- → P3
Whiteboard: tpi:+
Marking as quantum flow p1 because this blocks our ability to run webextensions in a seperate process. However this is only for Linux and can understand if that gets triaged down because of our number of users on Linux.
Whiteboard: tpi:+ → tpi:+[qf:p1]
Whiteboard: tpi:+[qf:p1] → tpi:+[qf]
Linux only so p3
Whiteboard: tpi:+[qf] → tpi:+[qf:p3]
It turns out I was wrong, and we also disable compositing for popups on other platforms. And force-enabling it causes weird issues with drop shadows and transparent areas.

I'm not sure what the solution for this is. I suppose we either need to find a way to support compositing in popups properly, or find a way to support RefLayers in BasicLayerManager. Perhaps we could have the compositor render to an intermediate surface, and draw that to the BLM.

Or we could possibly just embed a composited widget inside the popup widget, but I don't know if we could clip it properly for rounded corner support without running into the same transparency issues (and it feels an awful lot like windowed mode plugins).
Component: Widget: Gtk → Graphics: Layers
Priority: P3 → --
Summary: Support remote layer trees in popups on Linux → Support remote layer trees in popups
Whiteboard: tpi:+[qf:p3] → tpi:+[qf]
bas said he was going to look at some options here, since QF is highly motivated to have oop webextensions for 57.
Flags: needinfo?(bas)
Whiteboard: tpi:+[qf] → tpi:+[qf:p1]
It's not Linux-only. See comment 3.
Whiteboard: tpi:+[qf:p3] → tpi:+[qf:p1]
Ack, sorry for thrashing.
Flags: needinfo?(bas)
(In reply to Mike Conley (:mconley) from comment #7)
> Ack, sorry for thrashing.

Wait, I'm unclear now Mike, we -do- disable compositing for Pop-Ups on other platform. However, from what I understand the debate is about whether we need this for OOP WebExtensions. Mike, can you confirm that we do?
Flags: needinfo?(bas) → needinfo?(mconley)
(In reply to Bas Schouten (:bas.schouten) from comment #8)
> Wait, I'm unclear now Mike, we -do- disable compositing for Pop-Ups on other
> platform. However, from what I understand the debate is about whether we
> need this for OOP WebExtensions. Mike, can you confirm that we do?

I can confirm that we do. In order to enable OOP WebExtensions, we need to be able to render remote layer trees in popups, and we currently can't do that in any widget that doesn't support compositing.
Flags: needinfo?(mconley)
Assignee: nobody → bas
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #9)
> (In reply to Bas Schouten (:bas.schouten) from comment #8)
> > Wait, I'm unclear now Mike, we -do- disable compositing for Pop-Ups on other
> > platform. However, from what I understand the debate is about whether we
> > need this for OOP WebExtensions. Mike, can you confirm that we do?
> 
> I can confirm that we do. In order to enable OOP WebExtensions, we need to
> be able to render remote layer trees in popups, and we currently can't do
> that in any widget that doesn't support compositing.

Will it be sufficient for OOP web extensions if we supply a XUL attribute that can be set on a window to force using OMTC for this window?
Flags: needinfo?(mconley)
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #3)
> It turns out I was wrong, and we also disable compositing for popups on
> other platforms. And force-enabling it causes weird issues with drop shadows
> and transparent areas.

Did you try this on windows? Where did you see this issue, and if so which windows version? It seems to me like we're already disabling acceleration on Windows for transparent widgets. Using OMTC for it then -should- be okay afaict.
(In reply to Bas Schouten (:bas.schouten) from comment #10)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #9)
> > (In reply to Bas Schouten (:bas.schouten) from comment #8)
> > > Wait, I'm unclear now Mike, we -do- disable compositing for Pop-Ups on other
> > > platform. However, from what I understand the debate is about whether we
> > > need this for OOP WebExtensions. Mike, can you confirm that we do?
> > 
> > I can confirm that we do. In order to enable OOP WebExtensions, we need to
> > be able to render remote layer trees in popups, and we currently can't do
> > that in any widget that doesn't support compositing.
> 
> Will it be sufficient for OOP web extensions if we supply a XUL attribute
> that can be set on a window to force using OMTC for this window?

No. I was considering doing that as a temporary workaround, but forcing compositing turns the transparent areas of the popups black, at least on Linux and OS-X, and causes strange drop shadow issues on OS-X.

(In reply to Bas Schouten (:bas.schouten) from comment #11)
> Did you try this on windows? Where did you see this issue, and if so which
> windows version? It seems to me like we're already disabling acceleration on
> Windows for transparent widgets. Using OMTC for it then -should- be okay
> afaict.

I haven't tested on Windows yet, only Linux and OS-X. My Windows VM is broken at the moment and I'm waiting for Windows test hardware.
Flags: needinfo?(kmaglione+bmo)
kmag answered, clearing ni.
Flags: needinfo?(mconley)
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #12)
> (In reply to Bas Schouten (:bas.schouten) from comment #10)
> > (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #9)
> > > (In reply to Bas Schouten (:bas.schouten) from comment #8)
> > > > Wait, I'm unclear now Mike, we -do- disable compositing for Pop-Ups on other
> > > > platform. However, from what I understand the debate is about whether we
> > > > need this for OOP WebExtensions. Mike, can you confirm that we do?
> > > 
> > > I can confirm that we do. In order to enable OOP WebExtensions, we need to
> > > be able to render remote layer trees in popups, and we currently can't do
> > > that in any widget that doesn't support compositing.
> > 
> > Will it be sufficient for OOP web extensions if we supply a XUL attribute
> > that can be set on a window to force using OMTC for this window?
> 
> No. I was considering doing that as a temporary workaround, but forcing
> compositing turns the transparent areas of the popups black, at least on
> Linux and OS-X, and causes strange drop shadow issues on OS-X.
> 
> (In reply to Bas Schouten (:bas.schouten) from comment #11)
> > Did you try this on windows? Where did you see this issue, and if so which
> > windows version? It seems to me like we're already disabling acceleration on
> > Windows for transparent widgets. Using OMTC for it then -should- be okay
> > afaict.
> 
> I haven't tested on Windows yet, only Linux and OS-X. My Windows VM is
> broken at the moment and I'm waiting for Windows test hardware.

I will run some more tests, if those are successful, I will enable this on Windows at least. For OSX we probably need input from Jeff.
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar) → needinfo?(mstange)
For Mac we need to do some of the work from bug 1291457 and then some more work to get the shadow right when we enable OpenGL compositing for them. On Mac, using OpenGL is a prerequisite for off-main-thread compositing, which is a prerequisite for rendering remote content.
We should probably have a separate bug for the Mac part of this.
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #15)
> For Mac we need to do some of the work from bug 1291457 and then some more
> work to get the shadow right when we enable OpenGL compositing for them. On
> Mac, using OpenGL is a prerequisite for off-main-thread compositing, which
> is a prerequisite for rendering remote content.
> We should probably have a separate bug for the Mac part of this.

That sounds like a good idea, I can't resolve that part of the problem, I'll make sure Windows gets done. Is this a fundamental limitation about software composition on OS X, or is this just that we don't have code to composite non-OGL off the main thread?
I have some initial patches to fix bug 1362621 that add a remote flag to popups and widgets that have remote browsers, and force enable compositing and APZ for those widgets. They don't fix any of the rendering issues for composited, but should be good enough for testing.

I'm going to upload them for review in a separate bug you (bas and mstange) for review, and if they overlap or conflict with any of the work that you're doing for this, I'll update them to deconflict.
Depends on: 1365660
This seems to be mostly fine on try, there is one issue here:

https://treeherder.mozilla.org/logviewer.html#?job_id=99827640&repo=try&lineNumber=7639

There's a bunch of those crashes in M-e10s, Mike or Jim, do you have any idea what these are about? These don't look graphics related so it seems like they'd somehow be related to something in widget and what happens on windows when popups get OMTC.
Flags: needinfo?(mconley)
Flags: needinfo?(jmathies)
Priority: -- → P1
(In reply to Bas Schouten (:bas.schouten) from comment #18)
> This seems to be mostly fine on try, there is one issue here:
> 
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=99827640&repo=try&lineNumber=7639
> 
> There's a bunch of those crashes in M-e10s, Mike or Jim, do you have any
> idea what these are about? These don't look graphics related so it seems
> like they'd somehow be related to something in widget and what happens on
> windows when popups get OMTC.

The crash stack indicates the unsafe assertion happens in XPCOM shutdown in ~nsMemoryReporterManager() for graphics surface objects of type gfxASurface.

xul.dll!SurfaceMemoryReporter::Release() [gfxASurface.cpp:70305d34138f : 503 + 0x23]

The assertion is associated with nsAutoOwningThread::AssertCurrentThreadOwnsMe so some sort of thread ownership of memory issue here.

Looks like it should be pretty easy to reproduce.
Flags: needinfo?(jmathies) → needinfo?(bas)
Flags: needinfo?(mconley)
Hi Bas,

Do you have an estimate on when this bug will be resolved?  This is the primary remaining blocker for landing WE code for speed performance fixes.
> Do you have an estimate on when this bug will be resolved?  This is the
> primary remaining blocker for landing WE code for speed performance fixes.

According to the comments here, there's work on various platforms to do:

OSX: parts of bug 1291457 + "more work" for shadows - comment 15
Win: Gfx changes by Bas plus dealing with fallout from those changes - comment 18, comment 19
GTK: unknown amount of work.

None of this work is currently owned. Is this bug an absolute blocker? if so, for what and when? Right now this bug isn't on anyone's radar so it will not get fixed unless it gets prioritized within teams who have the engineering talent to work on it.
Flags: needinfo?(sescalante)
This bug is an absolute blocker, yes. We can't enable remote WebExtensions until we can display remote content in popups.

At the moment, though, the only platform where this is a P1 is Windows. We can enable OOP extensions on a per-platform basis, so as long as we can support Windows in the short term, we can defer work on other platforms until after 57.
Flags: needinfo?(sescalante)
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #22)
> This bug is an absolute blocker, yes. We can't enable remote WebExtensions
> until we can display remote content in popups.
> 
> At the moment, though, the only platform where this is a P1 is Windows. We
> can enable OOP extensions on a per-platform basis, so as long as we can
> support Windows in the short term, we can defer work on other platforms
> until after 57.

Deferring thanks to the various work arounds in bug 1365660?

(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #22)
> This bug is an absolute blocker, yes. We can't enable remote WebExtensions
> until we can display remote content in popups.
> 
> At the moment, though, the only platform where this is a P1 is Windows. We
> can enable OOP extensions on a per-platform basis, so as long as we can
> support Windows in the short term, we can defer work on other platforms
> until after 57.

Ok sounds good. Lets make this bug about the change on Windows only and file follow ups on OSX and linux.
(In reply to Jim Mathies [:jimm] from comment #23)
> Deferring thanks to the various work arounds in bug 1365660?

No, those will only be useful for early testers and QA until the UI issues are fixed. Except perhaps on Linux, where things seem to work surprisingly well as long as we disable transparency.

But we can defer enabling OOP on other platforms until after 57, since those platforms aren't part of the Quantum release criteria.
OS: Unspecified → Windows
Summary: Support remote layer trees in popups → Support remote layer trees in popups on Windows
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c42dea515f79932ef9756ebd8ad4044f46ff3850

It looks like marking the surface memory reporter's refcnt as threadsafe is enough to fix the memory management issues:

https://hg.mozilla.org/try/rev/54f3ac2b9bf0d50aede35c954e7c3c44ca4b78e8#l1.12

I think this a legitimate change, but I'm not sure. Nick, do you know?


With that fixed, though, there are some other issued. One with the APZ child apparently sending a message with the wrong layer ID:

https://treeherder.mozilla.org/logviewer.html#?job_id=102458935&repo=try&lineNumber=3014

And one where we apparently try to draw to a target that's already being drawn to:

https://treeherder.mozilla.org/logviewer.html#?job_id=102457880&repo=try&lineNumber=15085


I suspect that those may have to do with something getting confused between the main window's widget and the popup's.
Flags: needinfo?(n.nethercote)
Ah, and, I think that at least some of those issues are from recreating the menupopup's widget when we change its "remote" attribute. It didn't cause any problems when we were only switching from non-composited to composited, but there seem to be some synchronization issues when we switch from one composited widget to another:

https://treeherder.mozilla.org/logviewer.html#?job_id=102457774&repo=try&lineNumber=9541
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #25)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c42dea515f79932ef9756ebd8ad4044f46ff3850
> 
> It looks like marking the surface memory reporter's refcnt as threadsafe is
> enough to fix the memory management issues:
> 
> https://hg.mozilla.org/try/rev/54f3ac2b9bf0d50aede35c954e7c3c44ca4b78e8#l1.12
> 
> I think this a legitimate change, but I'm not sure. Nick, do you know?

It certainly can't hurt. If that change alone is fixing the assertion then I say go for it.
Flags: needinfo?(n.nethercote)
Comment on attachment 8872476 [details]
[checked-in] Part 1 - Mark SurfaceMemoryReporter refcounting as threadsafe.

https://reviewboard.mozilla.org/r/144010/#review147722

r=me as is, but it would be nice to have a brief comment above the NS_DECL_THREADSAFE_ISUPPORTS explaining why thread safety is required.
Attachment #8872476 - Flags: review?(n.nethercote) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/03f5cefeae50760e32505de29136c8cee65eba43
Bug 1356317: Part 1 - Mark SurfaceMemoryReporter refcounting as threadsafe. r=njn
Keywords: leave-open
after talking with Milan and Bas, unassigning this bug from Bas.  Bas is the right person to validate fixes because Windows GFX is the impacted area, but the changes need to happen in the widget code.

(on osx mstange could be the right person to validate and on linux it's karlt)
Assignee: bas → nobody
Flags: needinfo?(bas)
No longer depends on: 1365660
Assigning while Stephen assesses the issue.
Assignee: nobody → spohl.mozilla.bugs
Keywords: leave-open
I kicked off a new try run to see where we stand:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b34297e8e0fd81c3e01757af47e547884c7a5af&selectedJob=107963440

We're dealing with bc5 and bc6 failures. Both of these are due to the fact that we fail this assertion:
Assertion failure: sIndirectLayerTrees[child].mParent->mOptions == mOptions, at z:/build/build/src/gfx/layers/ipc/CompositorBridgeParent.cpp:1584

This is because we only set popups to use APZ if they contain remote content and this assertion fails because we have APZ enabled for one compositor, but not the other. By reverting the change from attachment 8868663 [details] in bug 1365660 and always enabling APZ for popups, these try failures go away. Instead, we're now running into new assertion failures in devtools tests:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=00ca03c4612cb1742fd143fba50ecbee319b50b2&selectedJob=107969875

These failures seem to be due to the fact that we're trying to set a displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're not actually dealing with root content or a toplevel document. This causes us to fail the assertion that was added in attachment 8845767 [details] [diff] [review] from bug 1346109. By removing the assertion and only setting the displayport base when we're actually dealing with root content or toplevel documents, these failures go away and we have a green try run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=feee9d5fc05c28c768b9e06ddf54f8bc1c3751cd

Unfortunately, I can't tell with certainty that enabling APZ for all popups, and if only setting a displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're actually dealing with root content or toplevel documents, is acceptable, or if this indicates a deeper issue that we should fix instead. I'm going to post both patches and request some feedback.
Attached patch Enable APZ for all popups (obsolete) — Splinter Review
Attachment #8878891 - Flags: feedback?(kmaglione+bmo)
Comment on attachment 8878891 [details] [diff] [review]
Enable APZ for all popups

Redirecting. Kats, is this acceptable, or do we instead need to support adopting children from one compositor to another if they have different APZ options? I'm not sure how hard this would be.
Attachment #8878891 - Flags: feedback?(kmaglione+bmo) → feedback?(bugmail)
Comment on attachment 8878892 [details] [diff] [review]
Only set a displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're dealing with root content or toplevel documents

Redirecting. Kats, is this acceptable, or do we need to figure out what we're dealing with here (presumably not root content or toplevel documents) and set an appropriate displayport base instead? I'm not sure how hard this would be. Also see comment 34. Thanks!
Attachment #8878892 - Flags: feedback?(kmaglione+bmo) → feedback?(bugmail)
Thanks for the detailed comments explaining the current state of things!

(In reply to Stephen A Pohl [:spohl] from comment #34)
> We're dealing with bc5 and bc6 failures. Both of these are due to the fact
> that we fail this assertion:
> Assertion failure: sIndirectLayerTrees[child].mParent->mOptions == mOptions,
> at z:/build/build/src/gfx/layers/ipc/CompositorBridgeParent.cpp:1584
> 
> This is because we only set popups to use APZ if they contain remote content
> and this assertion fails because we have APZ enabled for one compositor, but
> not the other.

It's not clear to me which content is getting adopted from one compositor to the other. Can you elaborate on this a little bit?
Comment on attachment 8878891 [details] [diff] [review]
Enable APZ for all popups

Review of attachment 8878891 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Stephen A Pohl [:spohl] from comment #38)
> Kats, is this acceptable, or do we instead need to support
> adopting children from one compositor to another if they have different APZ
> options? I'm not sure how hard this would be.

Adopting children from one compositor to another with different APZ options shouldn't be too hard, but adopting from one compositor to another with arbitrary differences in their CompositorOptions will be very hard. So I would rather just turn on APZ in the popups like you're doing in your patch than try to handle this adoption case in the compositor. The main concern with turning on APZ in the popups is that performance might suffer because we now have more overhead in painting stuff in the popup. If the performance penalty is acceptable to you as a tradeoff for getting this working, then it's fine by me.
Attachment #8878891 - Flags: feedback?(bugmail) → feedback+
Comment on attachment 8878892 [details] [diff] [review]
Only set a displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're dealing with root content or toplevel documents

Review of attachment 8878892 [details] [diff] [review]:
-----------------------------------------------------------------

Redirecting this to :tnikkel. It could be that only setting the displayport base in these conditions will invalidate assumptions elsewhere in the code, I'm not really sure.
Attachment #8878892 - Flags: feedback?(bugmail) → feedback?(tnikkel)
Comment on attachment 8878892 [details] [diff] [review]
Only set a displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're dealing with root content or toplevel documents

(In reply to Stephen A Pohl [:spohl] from comment #34)
> These failures seem to be due to the fact that we're trying to set a
> displayport base in APZCCallbackHelper::InitializeRootDisplayport when we're
> not actually dealing with root content or a toplevel document. This causes
> us to fail the assertion that was added in attachment 8845767 [details] [diff] [review]
> [diff] [review] from bug 1346109. By removing the assertion and only setting
> the displayport base when we're actually dealing with root content or
> toplevel documents, these failures go away and we have a green try run:
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=feee9d5fc05c28c768b9e06ddf54f8bc1c3751cd
> 
> Unfortunately, I can't tell with certainty that enabling APZ for all popups,
> and if only setting a displayport base in
> APZCCallbackHelper::InitializeRootDisplayport when we're actually dealing
> with root content or toplevel documents, is acceptable, or if this indicates
> a deeper issue that we should fix instead. I'm going to post both patches
> and request some feedback.

From the tryrun it looks like APZCCallbackHelper::InitializeRootDisplayport is being called from ChromeProcessController::InitializeRoot. Do we expect a ChromeProcessController to be created for a widget whose associated document isn't a toplevel document in it's process? That feels unexpected to me.
Flags: needinfo?(bugmail)
Comment on attachment 8878891 [details] [diff] [review]
Enable APZ for all popups

Review of attachment 8878891 [details] [diff] [review]:
-----------------------------------------------------------------

I'm a bit worried about this, since we wound up seeing checkerboarding issues in autocomplete popups that had no remote content the first time I landed this.

Would it be enough to add a separate attribute ("enableapz" maybe) to popups that might ever create remote browsers, and never change UseAPZ for a popup after it's created?
(In reply to Timothy Nikkel (:tnikkel) from comment #43)
> From the tryrun it looks like APZCCallbackHelper::InitializeRootDisplayport
> is being called from ChromeProcessController::InitializeRoot. Do we expect a
> ChromeProcessController to be created for a widget whose associated document
> isn't a toplevel document in it's process? That feels unexpected to me.

It might be. Kris or Stephen, can you please expand on the stuff in comment 40? It's not really clear to me where the content is living and why it's getting transplanted and understanding that flow better would probably help answer this question.
Flags: needinfo?(spohl.mozilla.bugs)
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #45)
> It might be. Kris or Stephen, can you please expand on the stuff in comment
> 40? It's not really clear to me where the content is living and why it's
> getting transplanted and understanding that flow better would probably help
> answer this question.

At the moment, we currently only enable compositing for popups when they have the "remote" attribute, and when that attribute changes (which happens when we open an extension sub-view in an existing panel), we create a new widget with the new compositor settings. I assume that's what we're talking about here, but I haven't looked at it in-depth.

After these changes, we always use compositing on Windows, though, and we wind up creating a new widget where both the old and new widget use compositors. I think we should actually be able to skip this step when the existing widget already has compositing and APZ enabled, though, which may solve the problem on its own.
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] from comment #46)
> At the moment, we currently only enable compositing for popups when they
> have the "remote" attribute, and when that attribute changes (which happens
> when we open an extension sub-view in an existing panel), we create a new
> widget with the new compositor settings. I assume that's what we're talking
> about here, but I haven't looked at it in-depth.

This seems plausible... maybe. Continuing along this line of reasoning, what might be happening is that when we create this new compositing widget, we also create a new layer manager go with it. The extension sub-view would be hosted inside the remote rendering machinery (RenderFrameParent/RenderFrameChild) and perhaps as part of setting that up, we're running through the code at [1]. That is currently the only place that sends the "AdoptChild" IPC message to the compositor, which we were seeing in comment 38. However, if this hypothesis is correct, then the "mLayerManager" at [1] would be a layer manager of a type that goes with the old non-compositing window, (i.e. probably a BasicLayerManager) as opposed to "lm" which would be a ClientLayerManager. We should be able to check for that and skip the AdoptChild call in that case. That would probably get things working again. The AdoptChild message is AFAIK only needed when dragging a tab from one window to another, and in that case both "lm" and "mLayerManager" are ClientLayerManager instances.

> After these changes,

Which changes?

> we always use compositing on Windows, though, and we
> wind up creating a new widget where both the old and new widget use
> compositors. I think we should actually be able to skip this step when the
> existing widget already has compositing and APZ enabled, though, which may
> solve the problem on its own.

If so, great. Would OS X and Linux still have this problem though? And do we care about those platforms for this purpose?

[1] http://searchfox.org/mozilla-central/rev/714606a8145636d93b116943d3a65a6a49d2acf8/layout/ipc/RenderFrameParent.cpp#227
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #47)
> (In reply to Kris Maglione [:kmag] from comment #46)
> > At the moment, we currently only enable compositing for popups when they
> > have the "remote" attribute, and when that attribute changes (which happens
> > when we open an extension sub-view in an existing panel), we create a new
> > widget with the new compositor settings. I assume that's what we're talking
> > about here, but I haven't looked at it in-depth.
>
> This seems plausible... maybe. Continuing along this line of reasoning, what
> might be happening is that when we create this new compositing widget, we
> also create a new layer manager go with it. The extension sub-view would be
> hosted inside the remote rendering machinery
> (RenderFrameParent/RenderFrameChild) and perhaps as part of setting that up,
> we're running through the code at [1].

I'm not sure that's possible. We currently always set the remote=true
attribute before inserting the browser, so the layer manager should have
already changed by then. However:

> ... The AdoptChild message is AFAIK only needed when dragging a tab from one
> window to another, and in that case both "lm" and "mLayerManager" are
> ClientLayerManager instances.

When loading popups that use the CustomizeableUI/PanelUI framework, we
currently create a hidden <browser> in a hidden <panel> to begin loading its
content before the panel appears. We then create a new <browser> in the final
popup and swap docshells. It looks we don't set the "remote" attribute on the
original, hidden panel, so it seems fairly likely that the docshell swap is in
fact the problem.

> > After these changes,
>
> Which changes?

Sorry, after the changes discussed above (comment 18), which would enable
compositing on all popups on Windows.

> > we always use compositing on Windows, though, and we
> > wind up creating a new widget where both the old and new widget use
> > compositors. I think we should actually be able to skip this step when the
> > existing widget already has compositing and APZ enabled, though, which may
> > solve the problem on its own.
>
> If so, great. Would OS X and Linux still have this problem though? And do we
> care about those platforms for this purpose?

We're not initially targeting OS-X and Linux for OOP extensions, so it's not a
concern at the moment. After 57, we'll need to add support, but in the mean
time, it's preffed off on those platforms.
(In reply to Kris Maglione [:kmag] from comment #48)
> When loading popups that use the CustomizeableUI/PanelUI framework, we
> currently create a hidden <browser> in a hidden <panel> to begin loading its
> content before the panel appears. We then create a new <browser> in the final
> popup and swap docshells. It looks we don't set the "remote" attribute on the
> original, hidden panel, so it seems fairly likely that the docshell swap is
> in
> fact the problem.

Can we check that setting the remote attribute on the hidden panel solves the AdoptChild problem? That would at least confirm this is what is happening.

> > Which changes?
> 
> Sorry, after the changes discussed above (comment 18), which would enable
> compositing on all popups on Windows.

Ah, makes sense.

> We're not initially targeting OS-X and Linux for OOP extensions, so it's not
> a
> concern at the moment. After 57, we'll need to add support, but in the mean
> time, it's preffed off on those platforms.

Ok.
Flags: needinfo?(kmaglione+bmo)
This is green on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ebc1dab110a1044dce604e8d9d8c58a43ffa3d9
Attachment #8878891 - Attachment is obsolete: true
Attachment #8878892 - Attachment is obsolete: true
Attachment #8878892 - Flags: feedback?(tnikkel)
Attachment #8881869 - Flags: review?(kmaglione+bmo)
Not sure if this should land as part of this bug, or if this tracked somewhere else.
Attachment #8881870 - Flags: review?(kmaglione+bmo)
Attachment #8881869 - Flags: review?(kmaglione+bmo) → review+
Comment on attachment 8881870 [details] [diff] [review]
Part 3: Enable OMTC on Windows

Review of attachment 8881870 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not the right person to review this. Redirecting to Bas.
Attachment #8881870 - Flags: review?(kmaglione+bmo) → review?(bas)
\o/ woot!
Comment on attachment 8881870 [details] [diff] [review]
Part 3: Enable OMTC on Windows

Forwarding as discussed. Thanks!
Attachment #8881870 - Flags: review?(bas) → review?(jmathies)
Comment on attachment 8881870 [details] [diff] [review]
Part 3: Enable OMTC on Windows

Review of attachment 8881870 [details] [diff] [review]:
-----------------------------------------------------------------

r+ patch originally posted by Bas.
Attachment #8881870 - Flags: review?(jmathies) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/b2f19a20a2ea for frequent Win7 debug assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=110901040&repo=mozilla-inbound - looks like your best chance of hitting it is in browser-chrome and devtools, particularly the odd combo of non-e10s browser-chrome and e10s devtools.
Flags: needinfo?(spohl.mozilla.bugs)
Looks like this WinCompositorWidget is getting initialized twice and we're missing a call to StartRemoteDrawing(). Hopefully not too hard to track down.

http://searchfox.org/mozilla-central/source/widget/windows/WinCompositorWidget.cpp#80
(In reply to Jim Mathies [:jimm] from comment #59)
> Looks like this WinCompositorWidget is getting initialized twice and we're
> missing a call to StartRemoteDrawing(). Hopefully not too hard to track down.
> 
> http://searchfox.org/mozilla-central/source/widget/windows/
> WinCompositorWidget.cpp#80

s/StartRemoteDrawing/EndRemoteDrawing
I've been looking into these failures since the patches got backed out last night. I agree that we seem to miss a call to EndRemoteDrawing somewhere. Unfortunately, these failures don't show up in the try push that was submitted in comment 51.

This patch simply calls EndRemoteDrawing at the beginning of StartRemoteDrawing if necessary as a possible workaround. However, I'm still trying to figure out where we might actually be missing the call to EndRemoteDrawing, since we never expected to call StartRemoteDrawing when mCompositeDC is non-null. One thing that stands out is that all logs of the failed test runs show the following message just before the assertion failure:
[GFX1-]: Attempt to create DrawTarget for invalid surface. Size(372,108) Cairo Status: 1

This does not show up in successful test runs.
There is also an issue with our dt8 tests. We seem to simply hang here:
https://dxr.mozilla.org/mozilla-central/source/devtools/client/inspector/shared/test/browser_styleinspector_tooltip-longhand-fontfamily.js#80

I have been unable to reproduce this locally, and this didn't happen on try in comment 51. I pushed to try again to see where we stand:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e7f1c90bb4c28e769669524c6efc616236e0268
Creating a draw target can fail, which can cause us to return nullptr from StartRemoteDrawing(). We shouldn't set mCompositeDC when this is the case.

I kicked off a new try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbc60456990363ca9784b7f763f8d9e14ac0c8cc

This should fix the asserts that we've seen in debug bc3 tests. I am not sure if this will have any effect on the dt8 test failures.
Attachment #8882628 - Attachment is obsolete: true
Flags: needinfo?(spohl.mozilla.bugs)
Attachment #8882729 - Flags: review?(jmathies)
Fixed a copy/paste error in previous patch. See comment 63 for further explanations.

Kicked off new try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5ec55a929183033f605c5d01d2bff1a7d48eef46
Attachment #8882729 - Attachment is obsolete: true
Attachment #8882729 - Flags: review?(jmathies)
Attachment #8882730 - Flags: review?(jmathies)
With part 4 applied, the bc3 failures appear to be fixed. What is left to investigate and fix are the timeout failures in devtools/client/inspector/shared/test/browser_styleinspector_tooltip-longhand-fontfamily.js. I've tried increasing the timeout interval in a try push[1] to see if this changes anything, but the test failures still occur:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6302bb50de9de2d474f8dc3d8edacfce7137418c&selectedJob=111335474

As mentioned in comment 62, we seem to simply hang here:
https://dxr.mozilla.org/mozilla-central/source/devtools/client/inspector/shared/test/browser_styleinspector_tooltip-longhand-fontfamily.js#80

I have been unable to reproduce this locally, and this didn't happen on try in comment 51. It is possible that a push unrelated to this bug here may have introduced this between the try push in comment 51 and now.
Attachment #8872476 - Attachment description: Bug 1356317: Part 1 - Mark SurfaceMemoryReporter refcounting as threadsafe. → [checked-in] Part 1 - Mark SurfaceMemoryReporter refcounting as threadsafe.
Current failure appares to be related to a popup that doesn't hide on mouseout. 

http://searchfox.org/mozilla-central/source/devtools/client/inspector/shared/test/browser_styleinspector_tooltip-longhand-fontfamily.js#84

The popup in question looks weird, it's not attached to the element, and is mostly off screen. I'll try repoing locally.

https://public-artifacts.taskcluster.net/FbkAOZMtRJOqgYT37YGXxw/0/public/test_info/mozilla-test-fail-screenshot_u_ppdz.png

Interesting too that the first use of assertTooltipHiddenOnMouseOut succeeds in  the prior task.
(In reply to Jim Mathies [:jimm] from comment #66)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3236a119ddee3ea9dacddb22e39c5aec89cb1a0d

OOMs and hangs in xul.dll!mozilla::layers::PCompositorBridgeChild::SendFlushRendering()
Looks like this may just be flakiness triggered by added latency in popup management. I'm testing some changes to help stabilize things.
Attachment #8882730 - Flags: review?(jmathies) → review+
These popups were showing up off screen. Combining that with the new accelerated rendering backend caused this test to fail about 50% of the time. Moving the parent element into view addresses the problem by insuring the window shows up on the desktop.

before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c51c0da596d83bd383c3f34f81a6abfe402ac48c
after:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=97a8b2ce60daf151bd4c10361bd85087a323c5cc
Attachment #8883095 - Flags: review?(mratcliffe)
(In reply to Jim Mathies [:jimm] from comment #70)
> Created attachment 8883095 [details] [diff] [review]
> Part 5: Touch up flaky devtools popup test
> 
> These popups were showing up off screen. Combining that with the new
> accelerated rendering backend caused this test to fail about 50% of the
> time. Moving the parent element into view addresses the problem by insuring
> the window shows up on the desktop.
> 
> before:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c51c0da596d83bd383c3f34f81a6abfe402ac48c
> after:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=97a8b2ce60daf151bd4c10361bd85087a323c5cc

I think I'll file a follow up on the offscreen window issue to investigate further. That work shouldn't hold up landing these fixes.
Attachment #8883095 - Flags: review?(mratcliffe) → review+
Pushed by jmathies@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/292cbc815dcc
Support remote layer trees in popups. r=kmag
https://hg.mozilla.org/integration/mozilla-inbound/rev/63b7c46bfec6
Enable off-main-thread compositing on Windows when possible. r=kmag
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6da01a927c9
Avoid intermittent debug assertions in browser chrome tests when creation of draw targets fails. r=jimm
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1181db72766
Touch ups for flaky dev tools popup tests. r=mratcliffe
I tested a couple of WebExtension popups in Nightly with this patch and couldn't get the popup working :(

Kris is investigating.
Reopening since this hasn't actually enabled remote layer tree support. It enabled OMTC, but the basic layer backend is still used for transparent popups, and it doesn't support remote layer trees.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Kris Maglione [:kmag] from comment #75)
> Reopening since this hasn't actually enabled remote layer tree support. It
> enabled OMTC, but the basic layer backend is still used for transparent
> popups, and it doesn't support remote layer trees.

Are you sure? There's two things called "Basic" involved, BasicLayerManager (main thread software compositing), and ClientLayerManager with a "Basic" backend (OMTC, but using software compositing). The patches in this bug switched us from BasicLayerManager to OMTC+Basic, which I assume is what we wanted, and will work with e10s.

The reason we use the software backend here, is because our transparent windows use WS_EX_LAYERED, which does not work with DirectX [1]. Transparent widgets have to disable acceleration [2][3], and as a result they won't get D3D11. That's okay.

Though, these patches broke WebRender and Advanced Layers. Both WR/AL skip this logic and weren't set up to reject transparent windows. But those are separate bugs in those systems.

[1] https://cs.chromium.org/chromium/src/ui/views/widget/widget_hwnd_utils.cc?q=WS_EX_LAYERED&sq=package:chromium&dr=C&l=140
[2] http://searchfox.org/mozilla-central/source/widget/windows/nsWindow.cpp#8149
[3] http://searchfox.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#2706
(In reply to David Anderson [:dvander] from comment #76)
> Are you sure? There's two things called "Basic" involved, BasicLayerManager
> (main thread software compositing), and ClientLayerManager with a "Basic"
> backend (OMTC, but using software compositing). The patches in this bug
> switched us from BasicLayerManager to OMTC+Basic, which I assume is what we
> wanted, and will work with e10s.

I'm sure that it doesn't work, and I'm sure that disabling transparency and
enabling acceleration fixes the problem. I'm not sure of the exact details of
why it doesn't work yet, but I'm still looking into it.
It looks like the problem only happens when the popup uses the basic layer manager but the parent browser window uses D3D. Which would explain which is why it worked for some users who I asked to test but for not others.
OK, it turns out the problem is that we're still using the wrong layer manager in some places in RenderFrameParent, which mostly works when both the popup and the top-level browser window use the same type of compositor, but not otherwise.

So I'm going to call this fixed and address that in a follow-up.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Blocks: 1379046
Depends on: 1379085
Blocks: 1379085
No longer depends on: 1379085
(In reply to Kris Maglione [:kmag] from comment #79)
> OK, it turns out the problem is that we're still using the wrong layer
> manager in some places in RenderFrameParent, which mostly works when both
> the popup and the top-level browser window use the same type of compositor,
> but not otherwise.
> 
> So I'm going to call this fixed and address that in a follow-up.

Any clue as to why this situation fails? Also, I'm curious, what fails exactly? Painting of the popup?
(In reply to Jim Mathies [:jimm] from comment #80)
> (In reply to Kris Maglione [:kmag] from comment #79)
> > OK, it turns out the problem is that we're still using the wrong layer
> > manager in some places in RenderFrameParent, which mostly works when both
> > the popup and the top-level browser window use the same type of compositor,
> > but not otherwise.
> > 
> > So I'm going to call this fixed and address that in a follow-up.
> 
> Any clue as to why this situation fails?

I didn't look into it in detail, since the fix was pretty obvious, but I think it's mainly because we wind up trying to use the d3d texture factory of the top-level window rather than the basic texture factory of the popup. I'm actually more curious as to why it works when they're the same type.

> Also, I'm curious, what fails exactly? Painting of the popup?

The popups paint, but the remote content does not paint correctly. The behavior is a bit odd, actually, at least on slow systems. There are some flickers where there are black or transparent areas where images from the remote browser should be painted, and so forth. But after a few hundred ms or so, it just winds up blank.
Comment on attachment 8882730 [details] [diff] [review]
Part 4: Fix debug asserts in tests

let's uplift this one patch to fix bug 1379085 on beta
Attachment #8882730 - Flags: approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/0a4579074942

I'm letting Bugherder set the status for 55 to fixed so that it doesn't show up on the needs-uplift radar still, but the real fix from this bug is quite clearly riding the 56 train. For future reference, it probably would have made more sense to just upload the one patch we cared about to bug 1379085 for separate tracking and approval.
Depends on: 1379940
Attached image 2017-07-31_1659.png
Manually verified this issue on Firefox 56.0a1 (2017-07-30/31) under Windows 10 64-bit and Mac OS X 10.12.3. The webextension panel content is successfully loaded.  
This issue could not be verified in Firefox 55 because not all patches were uplifted.
Status: RESOLVED → VERIFIED
Performance Impact: --- → P1
Whiteboard: tpi:+[qf:p1] → tpi:+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: